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ABSTRACT 



A compiler and method of compiling provide enhanced 
performance by inlining one or more frequently executed 
paths through a child procedure into a parent procedure 
without inlining the entire child procedure. Accordingly, a 
substantial improvement in speed of execution of the pro- 
gram can be achieved by reducing procedure call overhead, 
with reduced expense in terms of program size as compared 
to traditional inlining. Various criteria for determining 
whether to inline particular child procedures are also 
described. 
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PROFILE DRIVEN OPTIMIZATION OF "child" procedure, and the procedure which invokes the 

FREQUENTLY EXECUTED PATHS WITH child procedure is referred to as the "parent" procedure. 

INLINING OF CODE FRAGMENT (ONE OR While procedural programming can simplify program- 

MORE LINES OF CODE FROM A CHILD ming effort and reduce complexity, one of the unfortunate 

PROCEDURE TO A PARENT PROCEDURE) 5 results of a highly procedural computer program, is that the 

program, when operating, frequently transfers control 

FIELD OF THE INVENTION between the various procedures (executes "procedure 

^ . ... , calls"). This creates a substantial overhead, in that each 

TTie invenuon relates to optirmzmg ; compilers and meth- of contro , ^tweco procedures requires multiple 

ods ofcom Pl ung. More parucularly, the myenUor. relates to ^ u both to Uansfer flow maUoi to a 

opUmmng routines used in compiling whtch use inkmng. prot £ dure ^ t0 return flow from the proce dure. 

BACKGROUND OF THE INVENTION A similar unfortunate result occurs in so-called "object 

_ „ oriented" programming. In object oriented programming, 

Compters are generally used to transform one represen- d> „ and a ^ of rocedures (called « metbods ^ MC encap . 

tanon of a computer program into another representation. sulated ^tr, and only the procedures encapsulated with 

Typically, but not exclusively, compilers are used to trans- data are peralitted t0 modifv tbat ^ ^ ^ of pr0 . 

form a human readable form of a program such as source grammin namrall emKS procedure ^ t0 proliferate, 

code into a machine readable form such as object code. , ypically tQ , grea , er extent than procedural programming. 

A computer program suitable for compilation by a com- Xo address the pro biem of high procedure call overhead, 

piler is composed of a series of "statements". Some state- 2Q modem mmpUm optimi2e pr0 grams so as to avoid proce- 

ments generate, modify, retrieve or store information. Other dure calls Qnt opi i mizat j on approach is to "inline" 

statements may control the flow of the program, for procedureS) that ^ to co py lhe entire body of the child 

example, by testing the value of a variable and causing procedure) in t0 the body of the parent procedure, at each 

program flow to continue in different directions based on the locatioQ m lhe parcm procedure where lhe child pr0C edure 

value of the variable. In most programs of any significant 25 ^ referenced nis is done only when th e child 

length, the statements are collected into "procedures", which proced ure is relatively small and is called from relatively 

perform well-defined functions and can be used in poten- fow locations> ' m order t0 minimize the extent to which the 

tially multiple places with the program. Frequently, the Qverall emptied program size is increased due to inlining. 
procedures in a large program are further collected into 

"modules", each of which is responsible for a particular 3Q SUMMARY OF THE INVENTION 
major subset of the functions of the program. In a program Unfortunately, traditional inlining is often unable to sub- 
structure of this kind, the compiler is used to compile the stantially reduce the procedure call overhead of a program 
modules individually, after which the compiled modules are where to do so would be highly advantageous. In particular, 
"linked" together to form a single, cohesive computer pro- when calling a procedure that usually executes only a small 
gram. This approach allows the programmer to upgrade or 35 number of instructions before returning to its parent, the 
debug, and then recompile, each module separately, without procedure call overhead is a large fraction of the cost of 
need for recompiling the other modules. executing the procedure. For example, it is typical for a 

One type of compiler is an optimizing compiler which computer program to include a fairly large procedure that 

includes an optimizer for enhancing the performance of the checks for error conditions. If no error condition has 

machine readable representation of a program. Some opti- ^ occurred, the procedure returns, after executing only a small 

mizing compilers are separate from a primary compiler, number of statements. Only if there is an error condition, 

while others are built into a primary compiler to form a will any of the rest of the statements of the procedure be 

multi-pass compiler. Both types of compilers may operate executed (e.g., to create screen displays warning of the error, 

either on a human readable form, a machine readable form, etc.) If a procedure of this kind contains a significant number 

or any intermediate representation between these forms. 45 of instructions that are not usually executed, traditional 

One optimization technique is known as "profiling" the methods are unlikely to inline the procedure, 

program. A program is profiled by compiling the program, The present invention builds on the recognition that most 

and delivering it to a test environment which simulates of the statements in many large procedures found in typical 

actual field operation of the program. While the program computer programs are rarely if ever executed, and provides 

operates in the test environment, records are kept on the 50 a form of inlining optimization that appropriately handles 

extent to which certain sections of the program are used. large procedures in which most of the statements in the large 

After the test has been completed, the profile records are procedure are rarely or never used, 

used by an optimizing compiler, to recompile the program in In accordance with principles of the present invention, an 

a manner which enhances the efficiency of the program. For optimizing compiler utilizes inlining to improve the perfor- 

example, one known technique is to place sections of the 55 mance of a computer program having a parent procedure 

program which are used at approximately the same time, in which calls a child procedure. Instead of inlining an entire 

nearby memory locations, so as to speed access to the child procedure into the parent procedure, one or more 

program. selected paths through the child procedure are identified and 

A common computer programming approach is known as inlined into the parent procedure, without inlining at least 
procedural programming. In procedural programming, a 60 one path through the child procedure into the parent proce- 
program is broken into many small procedures, each includ- dure. Because only one or more selected paths through the 
ing a sequence of statements (and in some cases, data), and child procedure have been inlined, the procedure call over- 
each of which is responsible for particular well-defined head of the program can be reduced, at a reduced total 
activities. The procedures are invoked when particular increase in program size as compared to total inlining of the 
actions are needed. Typically, procedures can invoke each 65 child procedure. 

other, as part of operation of the program. In such a situation, In disclosed specific embodiments, the computer program 

the procedure which is invoked is typically referred to as the is evaluated to determine path frequencies for each proce- 
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dure prior to operation of the optimizing compiler. The FIG. 28 is a flow chart of specific operations performed 

optimizing compiler uses the path frequency data to select as part of caching a frequently executed path of a child 

the most frequently traversed path(s) through the child procedure, where appropriate, 
procedure. The most frequently traversed path(s) through 

the child procedure is(are) then inlined into the parent 5 DETAILED DESCRIPTION 

procedure. All paths which deviate from the most frequently l0 discussing the operation of embodiments of the 

traversed path(s) of the child procedure, are then replaced invention, a brief overview discussion of compilers and 

with fixup code In the disclosed specific embodiment, the compiling techniques is provided herein, 

fixup code includes a call to a complete, compiled version of Overview of Compilers 

the original child procedure. w Compilers and the like are generally known in the art. One 

In the disclosed specific : embodiment a child procedure { oioomj>]lcr * a multi-pass optimizing compiler, 
whose frequently executed path(s) has(have) side effects which a for converting source code into 
other than writes to automatic storage locations, is not an intermediate representation, and a backed which takes 
inlined. Furthermore, when miming a child procedure whose ^ intermediate representation and generates object code, 
frequently executed path(s) has(have) writes to automatic „ m from ^ nd of a mul(i . pass optimizing compiler typi- 
storage locations, the inlmed frequenUy executed path(s) of caUy mcludes a lexicographic analyzer which identifies 
the child procedure is(are) modified prior to inlining, so that lokens or kevwor ds in the source code, and a parser which 
the writes to automatic storage locations refer to copies of analyzes the program smtmtQ{ by sta tement. The parser 
the automatic storage locations used in the original child typically uses a context-free grammar to determine if pro- 
procedure. These steps ensure that whenever flow through 2Q gram stalemcnls satisf y a M 0 f grammar rules, and builds 
the child procedure deviates from the inlined path(s), flow construct s. ^ parser then generates an intermediate rep- 
can be re-directed (through the fixup code) to the complete resentation using an intermediate code generator, 
original child procedure, without any unwanted residual side ^ back<nd of a mu lti-pass optimizing compiler typi- 
effecls of the initial partial pass through the inlined path(s) cally mdudes an ^-^^ which operates on tD e inlerme- 
of the child procedure. 25 d - ate representation to generate a revised or optimized 

In the specific disclosed embodiment, a child procedure intermediate representation. Several different optimizations 

whose most frequently executed path(s) exceeds a predeter- may be performed, including but not limited to local opti- 

mined size, is not inlined. Furthermore, a child procedure mizations such as value numbering, elimination of redun- 

which deviates from its most frequently executed path(s) dant computations, register allocation and assignment, 

more than a predetermined percent of the time, is not inlined. 30 instruction scheduling to match specific machine 

The predetermined size and percentage can be adjusted so characteristics, moving invariant code out of loops, strength 

that the frequently executed path(s) of a child procedure reduction, induction variable elimination, and copy 

is(are) inlined only when the result will improve overall propagation, among others. The back-end also includes a 

performance of the program. fi na ] generator to generate the object code from the 

In a further aspect, the invention features a computer 35 revised intermediate representation, 

system for compiling a computer program into a machine- One of the tasks typically performed by a compiler is to 

readable representation, comprising an optimizer that opti- generate, from the source code or intermediate representa- 

mizes the computer procedure into an optimized represen- tion of a computer program, object code that handles the 

tation by identifying frequently traversed path(s) through the allocation and use of memory space. Memory space to 

child procedure, and inlining the frequently traversed path(s) 40 represent a variable may be allocated in different ways 

of the child procedure into the parent procedure in place of according to specifications made in the source code. For 

the procedure call from the parent procedure to the child example, a variable declared "static" will be allocated a 

procedure, without inlining at least one other path through permanent memory location throughout the execution of the 

the child procedure. program. However, a variable declared to be "automatic" 

In still a further aspect, the invention features a program 45 will only be allocated a memory location when the proce- 

product configured to optimize a computer procedure by dure containing the variable declaration is invoked. When 

inlining frequently traversed path(s) in accordance with the this procedure finishes executing, the memory locations 

aspects described above, and a signal bearing media bearing assigned to its automatic variables are relinquished for 

the program, which may be a transmission type media or a possible reuse by another procedure. This means that no 

recordable media. so assumptions can be made about the contents of an automatic 

These and other advantages and features, which charac- variable's memory location when a procedure begins 

terize the invention, are set forth in the claims annexed executing; indeed, the location used for an automatic vari- 

hereto and forming a further part hereof. However, for a able is likely to be different during separate invocations of 

better understanding of the invention, and the advantages the same procedure. 

and objectives attained by its use, reference should be made 55 A compiler may reside within the memory of the com- 

to the Drawing, and to the accompanying descriptive matter, puter system upon which the object code generated by the 

in which there is described embodiments of the invention. compiler is executed. Alternatively, a compiler may be a 

RRIFF nFsruiPTinN of THF DRAWINGS cross-compiler which resides on one computer system to 

BRIEF DESCRIPTION OF THE DRAWINGS Qbject ^ for cxccution Qrj aDOlher 

FIG. 1 is a block diagram of a computer system consistent 60 system. Either type of compiler may be used consistent with 

with the invention. the invention. 

FIG. 2 is a flow chart of specific operations performed as One suitable back-end for use with the invention is an 

part of an optimization of a computer program using inlining AS/400 optimizing translator supplied with an AS/400 

in accordance with principles of the present invention. minicomputer, which is a common back-end of an optimiz- 

FIG. 2A is a flow chart of specific operations performed 65 ing compiler. This product may be used with a front-end 

as part of inlining a frequently executed path of a child such as the ILE C Compiler available from IBM, among 

procedure into a parent procedure, where appropriate. others. It will be appreciated that other compilers are suit- 
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able for different languages and/or different hardware 
platforms, and may also be used in the alternative. 
Computer System 

Turning to the Drawing, wherein like numbers denote like 
parts throughout the several views, FIG. 1 shows a block 
diagram of a computer system 20 consistent with the inven- 
tion. Computer system 20 is an IBM AS/400 minicomputer. 
However, those skilled in the art will appreciate that the 
mechanisms and apparatus consistent with the invention 
apply equally to any computer system, regardless of whether 
the computer system is a complicated multi-user computing 
apparatus or a single user device such as a personal com- 
puter or workstation. As shown in FIG. 1, computer system 
20 includes a main or central processing unit (CPU) 22 
connected through a system bus 21 to a main memory 30, a 
memory controller 24, an auxiliary storage interface 26, and 
a terminal interface 28. 

Auxiliary storage interface 26 allows computer system 20 
to store and retrieve information from auxiliary storage such 
as magnetic disk, magnetic tape or optical storage devices. 
Memory controller 24, through use of a processor separate 
from CPU 22, moves information between main memory 30, 
auxiliary storage interface 26, and CPU 22. While for the 
purposes of explanation, memory controller 24 is shown as 
a separate entity, those skilled in the art understand that, in 
practice, portions of the function provided by memory 
controller 24 may actually reside in the circuitry associated 
with CPU 22 and main memory 30. Further, while memory 
controller 24 of the embodiment is described as having 
responsibility for moving requested information between 
main memory 30, auxiliary storage interface 26 and CPU 22, 
those skilled in the art will appreciate that the mechanisms 
of the present invention apply equally to any storage 
configuration, regardless of the number and type of the 
storage entities involved. 

Terminal interface 28 allows system administrators and 
computer programmers to communicate with computer sys- 
tem 20, normally through programmable workstations. 
Although the system depicted in FIG. 1 contains only a 
single main CPU and a single system bus, it will be 
understood that the invention also applies to computer 
systems having multiple CPUs and buses. 

Main memory 30 is shown storing a compiler 40 
(comprising analyzer 42, parser 44, optimizer 46 and code 
generator 48) and operating system 32. Memory 30 also 
includes a workspace 50, which is shown storing a computer 
program in various stages of compilation, including a source 
code representation 52, an intermediate representation 54, a 
revised and optimized representation 56 and object code 58. 
However, it should be understood that main memory 30 will 
not necessarily always contain all parts of all mechanisms 
shown. For example, portions of compiler 40 and operating 
system 32 will typically be loaded into caches in CPU 22 to 
execute, while other files may well be stored on magnetic or 
optical disk storage devices. Moreover, the various repre- 
sentations 52-58 of a computer program may not be resident 
in the main memory at the same time. Various representa- 
tions may also be created by modifying a prior representa- 
tion in situ. In addition, as discussed above, the front-end 
and back-end of the compiler, in some systems, may be 
separate programs. 

It will be appreciated that computer system 20 is merely 
an example of one system upon which the routines in accord 
with principles of the present invention may execute. 
Further, as innumerable alternative system designs may be 
used, principles of the present invention are not limited to 
any particular configuration shown herein. 
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In general, the routines executed to implement the illus- 
trated embodiments of the invention, whether implemented 
as part of an operating system or a specific application, 
program, object, module or sequence of instructions will be 

5 referred to herein as "computer programs". The computer 
programs typically comprise instructions which, when read 
and executed by one or more processors in the devices or 
systems in a computer system consistent with the invention, 
cause those devices or systems to perform the steps neces- 

10 sary to execute steps or generate elements embodying the 
various aspects of the present invention. Moreover, while the 
invention has and hereinafter will be described in the context 
of fully functioning computer systems, those skilled in the 
art will appreciate that the various embodiments of the 

15 invention are capable of being distributed as a program 
product in a variety of forms, and that the invention applies 
equally regardless of the particular type of signal bearing 
media used to actually carry out the distribution. Examples 
of signal bearing media include but are not limited to 

20 recordable type media such as volatile and non-volatile 
memory devices, floppy disks, hard disk drives, CD-ROM's, 
DVD's, magnetic tape, etc., and transmission type media 
such as digital and analog communications links. 
Use of Computer System 

25 Referring now to FIG. 2, in accordance with principles of 
the present invention, an optimizing compiler operates upon 
a computer program including one or more procedures by 
initially identifying each of the procedures in the program 
(step 100). Next, an estimate of the relative frequencies of 

30 paths through each procedure is gathered (step 102). Such an 
estimate may be obtained using one of several methods 
known to those skilled in the art. For example, dynamic 
profiling may be employed to gather frequencies from 
sample executions of the program; or heuristics may be used 

35 during compilation to make static estimates of path frequen- 
cies without run-time information; or directives may be 
placed by the programmer in the source code of the program, 
indicating which paths are expected to be most frequently 
executed. Any method of estimating path frequencies may 

40 be used within the spirit and scope of the invention. 

After estimating path frequencies, a call graph is con- 
structed for the program being compiled (step 104). There is 
a "node" in the call graph for each procedure in the program, 
and an "arc" from a node A to a node B if and only if the 

45 procedure represented by node A contains code that invokes 
the procedure represented by node B. 

Additional terminology relative to the call graph can be 
useful in further understanding the use of the call graph. 
When a first node in the call graph can be reached by 

50 following one or more arcs from a second node in the call 
graph through zero or more intermediate nodes, then the first 
node in the call graph is referred to as a "descendant" of the 
second node, and the second node is referred to as an 
"ancestor" of the first node. A descendant-ancestor relalion- 

55 ship between two nodes indicates that the procedure repre- 
sented by the descendant node can be invoked as part of 
executing the ancestor node. 

If generated in exactly the above manner, the call graph 
can include recursion; that is, it may be possible to follow 

60 arcs in the graph around a loop of nodes. Recursions of this 
kind must be eliminated, and this is done in step 104 by not 
creating arcs from a node to an ancestor of that node. This 
approach prevents recursive arcs from being stored in the 
call graph. 

65 After generating the call graph, a loop including steps 
108, 110, 112, 114 and 116 is performed for each of the 
nodes in the call graph. The nodes are selected (step 106) in 
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a reverse topological order, that is, each node is selected only inlined into its parent procedure. In this case, the optimizing 

after all of its descendant nodes have been selected. The process proceeds to step 132. 

reverse topological selection of the nodes can be achieved In step 132, a "fixup" block is created and added to the 

by passing through the nodes and selecting only those nodes program. The fixup block contains only those statements 

which do not have any outgoing arcs. On the next and each 5 which are needed to invoke the original, complete child 

subsequent pass through the nodes, those nodes which were procedure. The fixup block is created in order to simplify the 

not previously selected and only have arcs pointing to frequently executed path(s) of the child procedure which 

previously selected nodes, are selected. This process con- is(are) to be inlined into the parent procedure, 

tinues passing through the nodes, until all nodes have been Specifically, in step 134, the cached frequently executed 

selected. path(s) of the child procedure is(are) modified by inserting 

For each selected node (and the procedure it represents), branches to the fixup block at each point (statement) in the 

the optimizing process of FIG. 2 proceeds to step 108, where frequently executed path(s) where control flow falls from the 

the node is evaluated to determine whether it has any frequently executed path(s). For example, if a frequently 

outgoing arcs. This evaluation is equivalent to determining executed path includes an IF statement, and one branch of 

whether the procedure represented by the node calls any the IF statement is in the frequently executed path(s) and the 

other procedures as identified by the call graph. 15 other is not, then in the other branch, a branch to the fixup 

If in step 108, the current node has an outgoing arc, then block is inserted. As noted above, the fixup block invokes 

the procedure represented by the current node calls a child the entire, unmodified child procedure. Thus, in those situ- 

procedure. la this case, the optimizing process proceeds to ations where flow through the child procedure departs from 

step 110, in which the frequently executed path(s) of the a cached frequently executed path of the child procedure, 

child procedure is(are) inlined into the procedure repre- 20 then the entire child procedure is re-executed in its original 

sented by the current node, if appropriate. As will be form. Thus, there is some speed and efficiency penalty 

discussed below in connection with FIG. 2A, only some associated with a circumstance which causes execution to 

child procedures are inlined into their parent procedures, depart from the frequently executed path(s) of an inlined 

based on various criteria to be discussed in reference to FIG. child procedure; however, in accordance with the criteria 

2B. 25 discussed below in connection with FIG. 2B, this penalty is 

After step 110, or immediately after step 108 if the more than offset by the improvement in speed and efficiency 

procedure represented by the current node in the call graph obtained when execution flows through the frequently 

does not call any child procedures, the optimizing process executed path(s) of an inlined child procedure, 

determines whether the procedure identified by the current After generating the fixup block and modifying the 

node has at least one parent procedure as described by the 30 cached frequently executed path(s), the optimizing process 

call graph (step 112). proceeds to step 136, in which the optimizing process 

If there is an arc from any node in the call graph to the determines whether the memory cache contains a list of 

current node, then the procedure represented by the current automatic storage locations which are generated by the child 

node has a parent in the call graph. In this case, the procedure. As discussed above, automatic storage locations 

optimizing process proceeds to step 114, in which the 35 are generated by a procedure when the procedure causes 

frequently executed path(s) of the procedure represented by data to be stored in temporary memory locations, which are 

the current node in the call graph, is(are) cached into not used outside of the procedure. A compiler recognizes the 

memory, if appropriate. As will be discussed below in generation of an automatic storage location based on the 

connection with FIG. 2B, only some child procedures in the syntax of the computer programming language in use. A 

call graph are cached into memory for subsequent inlining 40 compiler may also generate code to allocate automatic 

into their parent procedures, based on predetermined criteria storage locations not specified in the source code, in order to 

to be discussed below. hold temporary results of intermediate calculations. 

After step 114, or immediately following step 112 if the If there are statements in the cached frequently executed 
procedure represented by the current node in the call graph path(s) of the child procedure that access data in automatic 
does not have any parent procedures, processing proceeds to 45 storage locations, then the parent procedure must be modi- 
step 116, in which it is determined whether there are any fied to allocate corresponding automatic storage locations, 
more nodes in the call graph to be processed, i.e., whether and the inlined copy of the frequently executed path(s) must 
all procedures in the program have been processed. If there be modified to access the parent's locations rather than those 
are more procedures to be processed, the optimizing process of the child. This is necessary, since the automatic storage 
returns to step 108; otherwise, the optimizing process is 50 locations of the child will not exist while executing the 
completed (step 118). inlined code. If flow through the inlined code departs from 

Referring now to FIG. 2A, as discussed above, in step 110 the child's frequently executed path(s), the call to the 

of the optimizing process of FIG. 2, the frequently executed original child procedure will not be affected by modifica- 

path(s) of a child procedure is(are) inlined into a parent lions to the parent's copy of the automatic storage locations; 

procedure, if appropriate. As discussed in more detail below 55 the child will instead use its own newly allocated storage 

with reference to FIG. 2B, those child procedures which are locations. 

eligible for inlining into parent procedures, have their fre- To avoid this difficulty, if in step 136, there is a cached list 

quently executed path(s) cached into memory, along with of automatic storage locations in memory, the optimizing 

various identifying information. If a child procedure is process proceeds to step 138, in which duplicates are made 

ineligible for inlining, it is not cached into memory. 60 of the automatic storage locations referenced in the cached 

Accordingly, in a first step 130, the optimizing process list. (Typically, the parent procedure is modified to allocate 

determines whether the child procedure to be potentially storage in its invocation stack frame to contain the dupli- 

inlined, has (a) frequently executed path(s) cached into cated storage locations.) Then, in step 140, the cached 

memory. If not, then the child procedure is chosen not to be frequently executed path(s) through the child procedure 

inlined, and step 110 is completed (step 144). 65 is(are) modified so that all references to automatic storage 

If there is a frequently executed path for the child proce- locations refer to the duplicates of those locations made in 

dure cached into memory, then the child procedure may be step 138. 
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Following step 140, or immediately after step 136 if there the program is intended, or other factors. In a further 

are no cached lists of automatic storage locations for the embodiment, this threshold may be specifiable by the user, 

frequently executed path(s) of the child procedure, the If the size of the frequently executed path(s) does not 

optimizing process proceeds to step 142, in which the exceed the threshold, then processing proceeds to step 156, 

modified frequently executed path(s) of the child procedure 5 in which the frequently executed path(s) is(are) evaluated to 

are inlined into the parent procedure at each point determine whether there are any side effects in the path(s) 

(statement) in the parent procedure which calls the child other than writes to automatic storage locations. Two kinds 

procedure. Once this is completed, the inlining of the child of side effects are of concern: modifications of memory, and 

procedure is done (step 144). exceptions. If a non-automatic memory location is modified 

Referring now to FIG. 2B, the activities involved in step 10 by the frequently executed path(s), then it is possible that 

114, to cache the frequently executed path(s) of a child when flow falls off of the inlined path(s), and the entire 

procedure, if appropriate, are described. In a first step 150, original child procedure is started over, the result will not be 

the path frequency data for the current procedure (generated the same as calling the original procedure directly. (For 

in step 102) is retrieved. Next, in step 152, this path example, the child procedure might read the modified stor- 

frequency data is analyzed to identify the most frequently is age location.) Similarly, if the frequently executed path(s) 

executed path(s) through the procedure. can cause an exception (e.g., overflow, divide by zero, illegal 

As discussed below, in one method for identifying path storage reference, etc.), then the effect of falling off the 

frequencies for a procedure, a count is taken of the number inlined frequently executed path(s) and calling the original 

of times that each possible path is taken through the proce- child procedure over again, might have different semantics 

dure during test execution of the program. The most fre- 20 than calling the original child procedure directly. So if the 

quently executed path(s) through the procedure are those frequently executed path(s) contains) these kinds of side 

paths which accumulate the greatest counts during test effects, no further action is taken to cache the procedure 

execution. There may be two or more parallel paths which (step 164). 

are executed at nearly similar frequency, accordingly, there If there are no side effects in the procedure other than 
may be more than one frequently executed path through the 25 writes to automatic storage locations, then processing pro- 
program which is identified in step 152. ceeds to step 158, in which the path frequency data is 
In one embodiment of the present invention, only the most evaluated to determine the frequency with which passes 
frequently executed path is identified in step 152, without through the procedure, are expected to deviate from the 
regard to other paths through the procedure which are nearly frequently executed path(s) through the procedure that were 
as frequently executed. In another approach to step 152, if 30 identified in step 152. E.g., if the ratio of the frequency of the 
the most frequently executed path through the procedure has frequently executed path(s) to the frequency of all paths is 
fewer statements than the predetermined threshold size used less than a threshold percentage, then the optimizing process 
in step 154 (see below), then the next-most-frequently proceeds to step 164, and the frequently executed path(s) of 
executed path is also included in the selected paths if it can the procedure are not cached. On the other hand, if the ratio 
be included without exceeding the threshold size used in 35 of the frequency of the frequently executed path(s) to the 
step 154, then the next-most-frequently executed path is also frequency of all paths is greater than a threshold percentage, 
included in the selected paths if it can be included without then the optimizing process proceeds to step 160, to begin 
exceeding the threshold size used in step 154, and so on, the process of caching the selected frequently executed 
until no further next-most-frequently executed parallel paths path(s) into memory. The threshold frequency used in step 
can be included without exceeding the threshold maximum 40 158, is selected to prevent inlining of child procedures which 
number of statements established in step 154. For relatively deviate so frequently from their selected frequendy executed 
small procedures, this latter method might ultimately incor- path(s) that the efficiency gain obtained from inlining the 
porate the entire small procedure, and all paths through the frequently executed path(s) of the procedure is more than 
procedure, into the selected path(s) identified in step 152. offset by the efficiency penalties associated with repeating 
Thus, for small procedures, the entire procedure is inlined in 45 the complete child procedure in those instances where 
accordance with the principles of the present invention just execution of the procedure deviates from the frequently 
as is done in conventional inlining processes. However, executed path(s). 

unlike conventional inlining, for larger procedures, only Assuming the selected frequently executed path(s) pass 
some of the path(s) through the procedure are selected in the tests of steps 154, 156 and 158, the optimizing process 
step 152, and ultimately inlined into the parent procedure as 50 proceeds to step 160, in which the frequently executed 
discussed above and in more detail below. path(s) is(are) cached into memory for later inlining into 
After identifying the frequently executed path(s) of the parent procedures, as discussed above. Next, in step 162, a 
procedure, in step 154, the number of statements (or object list of automatic storage locations used in the cached f re- 
code size, if measurable) of the path(s) is compared to a quently executed path(s) is cached into memory, for later use 
threshold. If the size of the frequently executed path(s) of the 55 when inlining the frequently executed path(s) into parent 
procedure exceeds the threshold, then the procedure will not procedures, as discussed above. After these steps, processing 
be inlined, and no further action is taken to cache the of the procedure is completed (step 164). 
procedure (step 164). The threshold is chosen to prevent Following the foregoing procedures, a computer program 
inlining of frequently executed path(s) which are so large may be optimized by inlining one or more paths through a 
that the efficiency gained by inlining the path(s) is insuffi- 60 child procedure into a parent procedure, without inlining the 
cient to justify the increased program size. In one entirety of the child procedure, resulting in substantially 
embodiment, this threshold may be a fixed number of improved speed of execution of the program with reduced 
statements or object code instructions; in another expense in terms of program size. 

embodiment, this threshold may be adjustable based on the It will therefore be appreciated that the invention provides 

number of locations in which the frequendy executed path(s) 65 significant advantages in terms of optimization of computer 

will be inlined in parent procedures, and/or the size of procedures during compilation, resulting in more efficient 

memory available in a particular computer system for which compilation. It will also be appreciated that numerous modi- 
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ficatioDS may be made to the disclosed embodiments con- 
sistent with the invention, without departing from the spirit 
and scope of the invention. For example, while in the 
foregoing, the use of a fixup procedure is described when 
program flow deviates from an inlined frequently executed 
path, other fixup mechanisms could be used, provided that 
the combined inlined code and fixup code are semantically 
equivalent to the original procedure call. Therefore, the 
invention lies in the claims hereinafter appended. 
What is claimed is: 

1. A method for optimizing a computer program including 
a child procedure and a parent procedure which includes one 
or more statements that invoke the child procedure, com- 
prising 

selecting one or more paths through the child procedure, 
the one or more paths including fewer than all of the 
paths through the child procedure, and 

inlining the one or more paths from the child procedure of 
said computer program into the parent procedure, in 
place of the one or more statements that invoke the 
child procedure without inlining into the parent proce- 
dure at least one reachable path through the child 
procedure that can be traversed upon invocation of the 
child procedure via the parent procedure. 

2. The method of claim 1 further comprising 
generating an estimate of path frequencies for one or more 

procedures of the computer program prior to said 
selecting and inlining, and wherein 
the one or more paths through the child procedure are 
selected in response to the estimate of path frequencies. 

3. The method of claim 2 wherein the estimate of path 
frequencies is generated by dynamic profiling of procedures 
of the computer program. 

4. The method of claim 2 wherein the estimate of path 
frequencies is generated by static heuristic analysis of pro- 
cedures of the computer program. 

5. The method of claim 2 wherein the estimate of path 
frequencies is generated from user-generated indications of 
frequently executed paths. 

6. The method of claim 2 wherein one or more frequently 
traversed paths through the child procedure is selected in 
response to the estimated path frequencies. 

7. The method of claim 2 wherein a most frequently 
traversed path through the child procedure is selected in 
response to the estimated path frequencies. 

8. The method of claim 2 further comprising 
determining from the estimated path frequencies whether 

execution of the child procedure deviates from the one 
or more selected paths through the child procedure 
more often than a predetermined threshold, and 
inlining said one or more selected paths through the child 
procedure only if execution of the child procedure 
deviates from the one or more selected paths less often 
than the predetermined threshold. 55 

9. The method of claim 1 further comprising replacing a 
path of the child procedure which deviates from the one or 
more selected paths with one or more statements that 
complete the original child procedure. 

10. The method of claim 9 further comprising generating 60 
fixup code including one or more statements invoking the 
complete original child procedure, and wherein 

paths of the child procedure which deviate from the one 
or more selected paths are replaced with the fixup code. 

11. The method of claim 1, further comprising 
determining whether the one or more selected paths of the 

child procedure has side effects other than storage of 
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data into temporary memory locations not used outside 
of the child procedure, and 
inlining said one or more selected paths through the child 
procedure only if the one or more selected paths do not 
contain side effects other than storage of data into 
temporary memory locations not used outside of the 
child procedure. 

12. The method of claim 11, further comprising 
modifying the one or more selected paths through the 

child procedure if the one or more selected paths 
contain accesses to temporary memory locations not 
used outside of the child procedure, by altering the 
accesses to said temporary memory locations to refer to 
copies of the temporary memory locations identified in 
the original child procedure. 

13. The method of claim 1 further comprising 
determining whether the one or more selected paths 

through the child procedure exceed a predetermined 
size, and 

inlining said one or more selected paths through the child 
procedure only if the one or more selected paths do not 
exceed the predetermined size. 

14. A computer system for compiling a computer program 
including a child procedure and a parent procedure which 
includes one or more statements that invoke the child 
procedure, into a machine -readable representation, the com- 
puter system comprising: 

(a) an optimizer that optimizes the computer procedure 
into an optimized representation, the optimizer select- 
ing one or more paths through the child procedure, the 
one or more paths including fewer than all of the paths 
through the child procedure, and inlining the one or 
more paths from the child procedure of said computer 
program into the parent procedure, in place of the one 
or more statements that invoke the child procedure 
without inlining into the parent procedure at least one 
reachable path through the child procedure that can be 
traversed upon invocation of the child procedure via the 
parent procedure; and 

(b) a machine-readable code generator that generates a 
machine-readable representation of the computer pro- 
cedure from the optimized representation. 

15. The computer system of claim 14 wherein the opti- 
mizer also generates estimates of path frequencies for each 
procedure of the computer program prior to selecting and 
inlining the one or more paths through the child procedure, 
and selects the one or more paths in response to the 
generated estimated path frequencies. 

16. The computer system of claim 15 wherein the estimate 
of path frequencies is generated by dynamic profiling of 
procedures of the computer program. 

17. The computer system of claim 15 wherein the estimate 
of path frequencies is generated by static heuristic analysis 
of procedures of the computer program. 

18. The computer system of claim 15 wherein the estimate 
of path frequencies is generated from user-identified indi- 
cations of frequently executed paths. 

19. The computer system of claim 15 wherein the opu*- 
mizer selects a frequently traversed path through the child 
procedure in response to the estimated path frequencies. 

20. The computer system of claim 14 wherein the opti- 
mizer replaces a path of the child procedure which deviates 
from the one or more selected paths with one or more 
statements that complete the original child procedure. 

21. A program product, comprising: 

(a) a program configured to optimize a computer program 
including a child procedure and a parent procedure 
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which includes one or more statements that invoke the 
child procedure, by 

selecting one or more paths through the child 
procedure, the one or more paths including fewer 
than all of the paths through the child procedure, 5 

inlining the one or more paths from the child procedure 
of said computer program into the parent procedure, 
in place of the one or more statements that invoke the 
child procedure without inlining into the parent 
procedure at least one reachable path through the 10 
child procedure that can be traversed upon invoca- 
tion of the child procedure via the parent procedure; 
and 

(b) a signal bearing media bearing the program. 

22 . The program product of claim 21 wherein the program 35 
also generates estimates of path frequencies for each pro- 
cedure of the computer program prior to selecting and 
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inlining the one or more paths through the child procedure, 
and selects the one or more paths in response to the 
estimated path frequencies. 

23. The program product of claim 22 wherein the program 
selects a frequently traversed path through the child proce- 
dure in response to the estimated path frequencies. 

24. The program product of claim 21 wherein the program 
replaces a path of the child procedure which deviates from 
the one or more selected paths with one or more statements 
that complete the original child procedure. 

25. The program product of claim 21, wherein the signal 
bearing media is a transmission type media. 

26. The program product of claim 21, wherein the signal 
bearing media is a recordable media. 

* * * • * 
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